AITopics | auxiliary data

Collaborating Authors

auxiliary data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Deep Batch Active Learning for Regression with Imperfect Data Guided Selection

Neural Information Processing SystemsJun-22-2026, 18:51:09 GMT

Active learning (AL) reduces annotation costs by selecting the most informative samples based on both model sensitivity and predictive uncertainty. While sensitivity can be measured through parameter gradients in an unsupervised manner, predictive uncertainty can hardly be estimated without true labels especially for regression tasks, reducing the informativeness of actively selected samples. This paper proposes the concept of auxiliary data to aid the uncertainty estimation for regression tasks. With detailed theoretical analysis, we reveal that auxiliary data, despite potential distribution shifts, can provide a promising uncertainty surrogate when properly weighted. Such finding inspires our design of AGBAL, a novel AL framework that recalibrates auxiliary data losses through density ratio weighting to obtain reliable uncertainty estimates for sample selection. Extensive experiments show that AGBAL consistently outperforms existing approaches without auxiliary data across diverse synthetic and real-world datasets.

artificial intelligence, auxiliary data, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ACautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

Neural Information Processing SystemsJun-15-2026, 19:57:52 GMT

Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets - an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of medications for OUD on withdrawal symptom severity (not the primary outcome of either trial). While XBOT measures withdrawal severity using the subjective opiate withdrawal scale, POAT uses the clinical opiate withdrawal scale. We analyze this realistic yet challenging setting where outcome measures differ across studies and where neither study records both types of outcomes. Our paper studies whether and when integrating studies with disparate outcome measures leads to efficiency gains.

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

Add feedback

Enhancing Deep Batch Active Learning for Regression with Imperfect Data Guided Selection

Neural Information Processing SystemsJun-14-2026, 04:22:05 GMT

Active learning (AL) reduces annotation costs by selecting the most informative samples based on both model sensitivity and predictive uncertainty. While sensitivity can be measured through parameter gradients in an unsupervised manner, predictive uncertainty can hardly be estimated without true labels especially for regression tasks, reducing the informativeness of actively selected samples. This paper proposes the concept of \textit{auxiliary data} to aid the uncertainty estimation for regression tasks. With detailed theoretical analysis, we reveal that auxiliary data, despite potential distribution shifts, can provide a promising uncertainty surrogate when properly weighted. Such finding inspires our design of AGBAL, a novel AL framework that recalibrates auxiliary data losses through density ratio weighting to obtain reliable uncertainty estimates for sample selection. Extensive experiments show that AGBAL consistently outperforms existing approaches without auxiliary data across diverse synthetic and real-world datasets.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Neural Information Processing SystemsJun-12-2026, 03:57:51 GMT

Global biodiversity is declining at an unprecedented rate, yet little information isknown about most species and how their populations are changing. Indeed, some90% Earth's species are estimated to be completely unknown. Machine learning hasrecently emerged as a promising tool to facilitate long-term, large-scale biodiversitymonitoring, including algorithms for fine-grained classification of species fromimages. However, such algorithms typically are not designed to detect examplesfrom categories unseen during training - the problem of open-set recognition(OSR) - limiting their applicability for highly diverse, poorly studied taxa such asinsects. To address this gap, we introduce Open-Insect, a large-scale, fine-graineddataset to evaluate unknown species detection across different geographic regionswith varying difficulty. We benchmark 38 OSR algorithms across three categories:post-hoc, training-time regularization, and training with auxiliary data, finding thatsimple post-hoc approaches remain a strong baseline. We also demonstrate how toleverage auxiliary data to improve species discovery in regions with limited data.Our results provide timely insights to guide the development of computer visionmethods for biodiversity monitoring and species discovery.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

b96ce7d38339874a8704e8895f743284-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 01:50:35 GMT

artificial intelligence, machine learning, robustness, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > California (0.46)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Collaborative Learning via Prediction Consensus

Neural Information Processing SystemsApr-24-2026, 09:17:01 GMT

We consider a collaborative learning setting where the goal of each agent is to improve their own model by leveraging the expertise of collaborators, in addition to their own training data. To facilitate the exchange of expertise among agents, we propose a distillation-based method leveraging shared unlabeled auxiliary data, which is pseudo-labeled by the collective. Central to our method is a trust weighting scheme that serves to adaptively weigh the influence of each collaborator on the pseudo-labels until a consensus on how to label the auxiliary data is reached. We demonstrate empirically that our collaboration scheme is able to significantly boost individual models' performance in the target domain from which the auxiliary data is sampled. At the same time, it can provably mitigate the negative impact of bad models on the collective. By design, our method adeptly accommodates heterogeneity in model architectures and substantially reduces communication overhead compared to typical collaborative learning methods.

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.90)

Add feedback

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Liu, Meitong, Jung, Christopher, Li, Rui, Feng, Xue, Zhao, Han

arXiv.org Machine LearningMar-31-2026

In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.

artificial intelligence, equation, machine learning, (16 more...)

arXiv.org Machine Learning

2603.28739

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback